Efficient Algorithm for δ-Approximate Jumbled Pattern Matching

نویسندگان

  • Iván Castellanos
  • Yoan J. Pinzón
چکیده

The Jumbled Pattern Matching problem consists on finding substrings which can be permuted to be equal to a given pattern. Similarly the δ Approximate Jumbled Pattern Matching problem asks for substrings equivalent to a permutation of the given pattern, but allowing a vector of possible errors δ. Here we provide a new efficient solution for the δ Approximate Jumbled Pattern Matching problem using indexing tables and bit vectors which, according to the experimental results, gives a speed up about 1.5 − 3.5 times faster than the solution based on Wavelet trees. This speed up depends mainly of the size of the alphabet. Further there are presented some solutions to another problems related to δ Approximate Jumbled Pattern Matching, as the All Matching problem, where it is necessary to calculate all the occurrences of a given pattern allowing an error in the text, or the Min-Error problem, where the objective is to find the occurrences which are closer to the pattern.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Tuning Algorithms for Jumbled Matching

We consider the problem of jumbled matching where the objective is to find all permuted occurrences of a pattern in a text. Besides exact matching we study approximate matching where each occurrence is allowed to contain at most k wrong or superfluous characters. We present online algorithms applying bit-parallelism to both types of jumbled matching. Most of our algorithms are variations of ear...

متن کامل

On Tuning the (α, Δ)-sequential-sampling Algorithm for Δ-approximate Matching with Α-bounded Gaps in Musical Sequence

In this paper we present a new efficient algorithm for the δ-approximate matching problem with α-bounded gaps which arises in many questions concerning musical information retrieval and musical analysis. Our presented algorithm is an efficient variant of the (δ, α)-SequentialSampling algorithm (Cantone et al., 2003), recently introduced by the authors. An extensive comparison with the other sol...

متن کامل

On Hardness of Jumbled Indexing

Jumbled indexing is the problem of indexing a text T for queries that ask whether there is a substring of T matching a pattern represented as a Parikh vector, i.e., the vector of frequency counts for each character. Jumbled indexing has garnered a lot of interest in the last four years; for a partial list see [2, 6, 13, 16, 17, 20, 22, 24, 26, 30, 35, 36]. There is a naive algorithm that prepro...

متن کامل

Indexes for Jumbled Pattern Matching in Strings, Trees and Graphs

We consider how to index strings, trees and graphs for jumbled pattern matching when we are asked to return a match if one exists. For example, we show how, given a tree containing two colours, we can build a quadratic-space index with which we can find a match in time proportional to the size of the match. We also show how we need only linear space if we are content with approximate matches.

متن کامل

Jumbled Matching with SIMD

Jumbled pattern matching addresses the problem of finding all permuted occurrences of a substring in a text. We introduce two improved algorithms for exact jumbled matching of short patterns. Our solutions apply SIMD (Single Instruction Multiple Data) computation in order to quickly filter the text. One of them utilizes the equal any operation and the other searches for the least frequent chara...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015